fix(portal): security hardening from v0.2.0 post-merge audit#101
Merged
Conversation
Addresses three P0 findings from the post-merge review of the B1 embeddable portal stack: 1. PortalEndpointsController gains a controller-level [EnableRateLimiting] attribute so portal mutating routes share the per-tenant token-bucket budget. Previously a leaked token could spam /test (real outbound HTTP POST) without limit. 2. PortalTokenAuthMiddleware switches from a static JwtSecurityTokenHandler to a per-instance one with MapInboundClaims=false and an options-driven MaximumTokenSizeInBytes (PortalAuthOptions.MaxTokenSizeBytes, default 8 KiB). Defeats the .NET default 250 KiB DoS amplification surface and removes per-request claim mapping overhead. 3. PortalLookupCache.Set now atomically AddOrUpdate-swaps the per-app CancellationTokenSource and cancels+disposes the previous one, instead of GetOrAdd-reusing it. Closes a race between Set and InvalidateApplication that could bind a fresh cache entry to a disposed token. Adds Portal_Request_With_Oversized_Token_Returns_401_Without_Parsing regression test to pin the size cap.
This was referenced May 11, 2026
voyvodka
added a commit
that referenced
this pull request
May 11, 2026
…it doc-drift) (#103) Closes the documentation half of the v0.2.0 portal audit. Tur 1 (#101) was the security fix, Tur 2 (#102) was the test coverage; this PR is the doc drift the same audit surfaced. docs/API.md §3.8 — Portal API (Customer-Facing JWT): - HS256 JWT contract: algorithm pin, signing key per-app, lifetime cap, clock skew, token size cap, required + optional claims. - Capability table: endpoints:read|write|test, attempts:read. - Per-app CORS rules: no wildcards, https-only, RFC 6454 case- insensitive matching, preflight semantics. - Rate limit: shares send-by-appid partition; cross-tenant lookups return 404 (never 403, which would leak existence). - All 10 portal routes documented with request/response shape. - 5 dashboard portal-admin routes documented. - Portal-specific error code table. - End-to-end probe with jose (Node.js mint) + cURL. docs/ARCHITECTURE.md §4.3 — Portal Token Authentication: - Per-application secrets stored on Application (PortalSigningKey, AllowedPortalOriginsJson, PortalRotatedAt). - Pipeline ordering with the three invariants it encodes (ApiKeyAuth bypass, PortalToken-before-PortalCors, both-before-RateLimiter). - PortalLookupCache: TTL, instant local invalidation, atomic CTS swap. - Cross-tenant isolation via 2-arg GetByIdAsync. - JWT validator defense-in-depth (HS256 pin, 8 KiB token cap, MapInboundClaims=false, lifetime cap, opaque error bodies).
4 tasks
voyvodka
added a commit
that referenced
this pull request
May 11, 2026
…alidator merge, PUT→PATCH, disable preserves origins) (#104) Closes the four P1 behaviour findings from the v0.2.0 portal audit. Tur 1 (#101) shipped the P0 security fixes, Tur 2 (#102) shipped the test coverage, Tur 3 (#103) shipped the docs; this Tur 4 closes the behaviour delta. 1. PortalCorsMiddleware deny-cache. HandlePreflightAsync now caches both allow and deny outcomes via IMemoryCache for the same TTL as the per-app signing-key lookup (default 60s). Browsers don't cache rejected preflights, so every OPTIONS from a disallowed origin used to re-scan the portal-enabled app set + deserialize the JSON allowlist — a free DB hammer vector for any caller that knew the portal prefix. Cache key is lowercased-origin scoped (RFC 6454 §4 case-insensitive). New test Preflight_Deny_Decision_Is_Cached_Within_Ttl pins the behaviour by mutating the DB to allow the origin after a 403 and asserting the second call still 403s within the TTL window. 2. EndpointValidationRules helper consolidation. New extension methods (EndpointUrlSyntax, EndpointDescription, EndpointTransformExpression, EndpointCustomHeaders, EndpointAllowedIpsCidrs, EndpointSecretOverride) become the single source of truth for the field-shape rules shared between the 4 admin endpoint validators and the 2 portal endpoint validators. Without this consolidation, tightening a rule on one surface silently leaves the other surface weaker — exactly the drift pattern the audit flagged. Async DNS host-safety check stays in each validator (DependentRules + CustomAsync needs the full property selector). Behaviour unchanged. 3. PortalEndpointsController.Update → [HttpPatch]. The action's body semantics were always partial-replace (every field optional, only non-null fields applied) — that's PATCH, not PUT. Switching the verb aligns the wire surface with reality and stops misleading REST consumers that expect PUT to be full-replace. Route, request shape, response shape unchanged. Two existing tests ported from PutAsJsonAsync to PatchAsync + JsonContent.Create. 4. DashboardPortalController.Disable preserves origins. Removed the line that nulled AllowedPortalOriginsJson on disable. Disable now revokes only the auth surface (PortalSigningKey, PortalRotatedAt) and keeps the operator-curated CORS allowlist so re-enable doesn't force re-curation. Explicit clear path remains: PUT /portal/origins with {origins: []}. Renamed test Disable_Clears_SigningKey_And_Origins → Disable_Clears_SigningKey_But_Preserves_Origins with assertion flip.
4 tasks
voyvodka
added a commit
that referenced
this pull request
May 11, 2026
…e TTL) (#105) Locks in two v0.2.0 audit decisions that were marked 'worth recording' in the reviewer agent's punch list. Both are pure documentation — behaviour was already shipped in PRs #101 and #104. ADR-004 — Portal Signing Key Storage: - Plaintext varchar(64) at rest, mirroring the existing SigningSecret column. No application-level encryption; relies on operator's Postgres deployment posture. - Instant invalidation on rotate / disable. No grace window for overlapping old + new key validity. - One-shot reveal on enable / rotate; subsequent reads return only portalEnabled bool + rotated-at + origins (never the key). - Documents alternatives (pgcrypto, rotation grace, external KMS) and revisit triggers (compliance regimes, host-integration friction, key reuse beyond JWT verification). ADR-005 — Portal CORS Preflight Deny-Cache TTL: - 60s default (LookupCacheTtlSeconds reuse); cache both allow and deny outcomes; key is lowercased-origin scoped. - No synchronous invalidation hook from PUT /portal/origins: cache key is origin-scoped, operator action is app-scoped — bookkeeping cost outweighs the <=60s staleness. - Documents alternatives (per-app invalidation, shorter/longer TTL, negative-only cache) and revisit triggers (operator UX complaints, memory pressure via origin enumeration).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Post-merge audit of the B1 embeddable customer portal surfaced three P0 findings. This PR closes all three in one focused fix.
Findings addressed
Rate limit bypass —
PortalEndpointsControllercarried no[EnableRateLimiting]attribute. A leaked or misbehaving portal token could spam mutating routes (notably/test, which fires a real outbound HTTP POST) without sharing the per-tenant token-bucket budget that the public API has always enforced.→ Controller-level
[EnableRateLimiting(\"send-by-appid\")]reuses the existing partition;PortalTokenAuthMiddlewarealready populatesHttpContext.Items[\"AppId\"]upstream ofRateLimiter.JWT handler DoS amplification —
PortalTokenAuthMiddlewareused a staticJwtSecurityTokenHandlerwith the .NET defaults (MapInboundClaims = true,MaximumTokenSizeInBytes ≈ 250 KiB). A multi-hundred-KB Bearer payload was parsed before being rejected.→ Per-instance handler with
MapInboundClaims = false(we read rawappId/capabilitiesclaim keys, so mapping is pure overhead) andMaximumTokenSizeInBytes = PortalAuthOptions.MaxTokenSizeBytes(default 8 KiB). Portal tokens are typically 0.5-2 KB; 8 KiB leaves comfortable headroom.PortalLookupCache invalidation race —
Set()usedGetOrAddto reuse the per-appCancellationTokenSource. After anInvalidateApplicationcancelled+disposed it, the nextSetwould bind a fresh cache entry to that disposed token (CancellationChangeTokenover a disposed source → entry expires immediately, or worse, throws on a tight race).→
Setnow atomicallyAddOrUpdate-swaps the CTS and cancels+disposes the previous one. BothSetandInvalidateApplicationroute through a sharedCancelAndDisposehelper that swallowsObjectDisposedExceptionon the concurrent-double-dispose edge.Tests
Portal_Request_With_Oversized_Token_Returns_401_Without_Parsingpins the 8 KiB cap (sends 16 KiB Bearer payload, expects401 PORTAL_AUTH_INVALID_TOKEN).Test plan
dotnet build WebhookEngine.sln --configuration Release— 0 warnings, 0 errorsdotnet test— 253/253 passing (Core 34, API 108, Worker 46, Infrastructure 65)Follow-ups (tracked, not in this PR)
AnyAllowsPortalOriginAsyncTestcontainers,PortalLookupCacheTTL/invalidate)docs/API.mdanddocs/ARCHITECTURE.mdportal sections)